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ABSTRACT 

We present a hierarchical Bayesian method for fitting infrared spectral energy distributions (SEDs) 
of dust emission to observed fluxes. Under the standard assumption of optically thin single temper- 
ature (T) sources the dust SED as represented by a power-law modified black body is subject to a 
strong degeneracy between T and the spectral index j3. The traditional non-hierarchical approaches, 
typically based on x 2 minimization, are severely limited by this degeneracy, as it produces an artifi- 
cial anti-correlation between T and /3 even with modest levels of observational noise. The hierarchical 
Bayesian method rigorously and self-consistently treats measurement uncertainties, including calibra- 
tion and noise, resulting in more precise SED fits. As a result, the Bayesian fits do not produce any 
spurious anti-correlations between the SED parameters due to measurement uncertainty. We demon- 
strate that the Bayesian method is substantially more accurate than the % 2 fit in recovering the SED 
parameters, as well as the correlations between them. As an illustration, we apply our method to 
Herschel and submillimcter ground-based observations of the star-forming Bok globule CB244. This 
source is a small, nearby molecular cloud containing a single low-mass protostar and a starless core. 
We find that T and /? are weakly positively correlated - in contradiction with the x 2 fits, which 
indicate a T — /3 anti-correlation from the same data-set. Additionally, in comparison to the x 2 fits 
the Bayesian SED parameter estimates exhibit a reduced range in values. 

Subject headings: infrared: ISM — ISM: dust — ISM: structure — methods: data analysis — methods: 
statistical — stars: formation 



1. INTRODUCTION 

Dust provides an important observational avenue for 
investigating a variety of astrophysical environments. 
Though dust amounts to only a fraction (^1/100) of the 
gaseous mass, it is an efficient radiator and thus provides 
clues to the energy content of the interstellar medium 
(ISM). Dust grains are heated by a variety of sources, 
such as stellar radiation or collisions with gas, and re- 
radiates most of this energy as thermal emission. For 
temperatures characteristic of the ISM, thermal emission 
from dust is predominantly at far infrared (IR) and sub- 
millimeter wavelengths. Consequently, properties of the 
far-IR and sub-mm spectral energy distribution (SED) 
can provide vital information about the physical state 
of observed systems. For example, the amount of emer- 
gent IR flux scales proportionally with the star formation 
rate in (ultra) luminous infrared galaxies ([U]LIRGs, e.g., 
lLagache~et al. 2005); SED shapes indicate the evolution- 
ary s tate of star forming cores within molecular clouds 
(e.g.. lAdams et aLlll987f k and grain evolution in proto- 
plane tary disks can be m odeled using measured SEDS 
(e.g.. iWatson et al.l 12007t h Understanding dust SEDs 
therefore represents a crucial step in numerous research 
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fields spanning a wide range of spatial scales. Ground 
and space based telescopes, such as IRAS, ISO, Spitzer, 
Herschel, Planck, SCUBA(2), MAMBO(2), SABOCA, 
and new instruments on SOFIA have and continue to 
measure IR emission, with the aim of quantifying dust 
SEDs. 

At far-IR wavelengths (A ^ 60 /Ltm) the shape of SEDs 
due to thermal emission from dust is empirically found 
to be well represented as a Planck function, B V (T), eval- 
uated at the dust te mperature T mo dified by a power 
law in frequency, v^ (Hildcbrandl ll983l ): 



S v = nNn [ — ) B V {T). 



(1) 



Here, Vt is the solid angle of the observing beam, N is 
the column density, B V (T) is the Planck function, and 
k,o(v/v$)P is the opacity of the emitting dust. Note that 
To = Nkq is the optical depth at frequency vq. An 
essential assumption in Equation 1 is that dust is op- 
tically thin, so that t(v) <C 1. Following convention, 
we assume that th e opacity Kg is known, and employ 
K = 0.009cm 2 /g (jOssenkopf fc Henningl Il994[ h which 
accounts for the dust-to-gas ratio, so that the free param- 
eter in the fit is the gas column density N. The spectral 
index (3 determines the opacity k„ of the dust, and en- 
codes information about grain composition. Usually, f3 
is found to be ^2 for silicate and/or carb onaceous grain 
comp osition common in the diffuse ISM (jDraine fc Led 
H98l . 

Numerous observational investigations have focused on 
measuring the value of f3. Most observational analyses 
estimate T and (3 by employing a least-squares (x 2 ) SED 
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fit. However, due to the degeneracy between T and 0, 
known since the earliest efforts to derive dust proper- 
ties from a lim ited number of far -IR and submillimeter 
measurements (iKeene et al.lll980T ). \ 2 fits underestimat- 
ing; will natura lly overestimate T (|Blain et al.l 120031 : 
iSajina et~a l. 2006), or vice versa. Erroneous estimates of 
T and/or may arise due to the inaccurate assump- 
tion of a constant t emperature along the line-of-sight 
(jShettv et al.l l2009bh . or uncertainty in the flux mea- 
surements. Additionally, the presence of internal sources 
may also yield an artificial T — 8 ant i-c orrelation in 
the est imated SED (jMalinen et al.l 120111 ). iShettv et all 
(|2009aT> demonstrate through Monte Carlo simulations 
that modest noise levels can lead to spurious T — anti- 
correlations. Employing models of isothermal sources 
with constant 0, they show that due to uncertainties 
as low as 5%, \ 2 fits produce erroneous T and esti- 
mates. Moreover, the fits result in an artificial T — (3 
anti-correlation which is remarkably similar to the trend 
derived from observational investigations. IShettv et al.l 
(2009a) conclude that x 2 SED fits cannot reliably reveal 
the true T— 8 correlation under realistic conditions where 
noise is present. 

This T — (3 degeneracy may have obscured the re- 
lationship between dust temperature and composition 
present in astrophysical sources. Observational investi- 
gations employing careful statistical analyses also con- 
clude that the T — 8 anti-correl a tion d erived from x 2 fits 
may be spurious. iSchnee et al.l (|2010l ) demonstrate that 
though a x 2 fit to fluxes from the starless core TMC- 
1C produces the anti-correlation, due to the degeneracy 
between T and 8 the data is also consistent with a con- 
stant th roughout the sourc e. Using radiative transfer 
modeling, Uuvela et al.l ()201lD suggest that line-of-sight 
temperature variations may be responsible for the x 2 es_ 
timate of a decrease in (3 towards an internally heated 
core. 

Accurately measuring and T is essential for under- 
standing grain growth and evolution. Under the dust 
coagulation scenario, a higher frequency of dust agglom- 
eration in dense regions results in increased grain sizes. 
In pro toplaneta ry disks , is usually measured to be ;$ 
1 (e.g. | Mivake fc Nakagawa 1993]; iMannings fe Emerson! 
119941 : Ibrainc 2006) , lower than t he larger scale ISM value 
of ~2 ([Draine fc Led 11984 . iGoldsmith et al.l (1997) 
found some indications of a decrease in towards higher 
density gas in the Orion molecular cloud. In broad terms, 
temperatures are generally lower in higher density re- 
gions (away from embedded sources) due to more ef- 
fective shielding from the ambient interstellar radiation 
field. Thus, in the dust coagulation scenario the decrease 
in towards denser regions should be associated with a 
decrease in temperature and an increase in density. 

On the other hand, x 2 SED fitting from various 
sources - from starless cores to entire galaxies - sug- 
gests that increases with decreasing T. Using IRAS 
and balloon-b orne P R ONA OS observations of various 
sources, iDupac et al.1 (|2003l ) found an inverse correla- 
tion between the spectral index and temperature, with 
ran ging from 0. 8 to 2 .4, and T ranging from 11 to 
80 K. IDupac et al.l (|2003l ) suggested that the hyperbolic 
shape of the T — anti-correlation may have a physi- 
cal basis, such as a variation of dust composition with 



temperature. lYang fe Phillips! (|2007l ) f ound a similar 
trend from a sample of LIRGs, although lHavward et al.l 
(|2011f) argue that the optically thin isothermal SED 
model is inappropriate for extra-galactic sources, at 
least for sub-mm galaxies. More recently, SEDs derived 
from Herschel and Planck observ ations have also con- 
veyed T — anti-co r relations (e.g. lAnderson et al.1120101 : 
iParadis et ail 120101: iPlanck Collaboration et al.l 120 111) . 
These results are interpreted to be in agreement with 
laboratory studies: laboratory measurements of some 
amo rphous materials also show a T — anti -correlation 
(e.g. lAgladze et al.lll996l iBoudet et al.ll2005t l. However, 
those laboratory results depend on grain composition 
and wavelength, and do not consider how variations in 
(volume) densities similar to the range found in the ISM 
can affect the results. The comparison is further compli- 
cated by the spurious anti-correl ation arising in x 2 fits 
due to noise (jShettv et al.l l2009al ) . and by variations of 
temperature and dus t composition along the line-of-sight 
(IShettv et al.l I2009ri iMalinen et all 120111: Uuvela et all 
2011). 

Properly treating the degeneracy between T and is 
thus a necessity for accurately fitting dust SEDs and 
assessing any variation of grain properties with tem- 
perature. The motivation for this work stems from 
the T — anti-correlation found in observational anal- 
ysis which appear to be very similar to the spurious 
anti-correlation due solely to statistical uncertainties 
(jShettv et al. 2009a). The main goal of this work is to de- 
velop a SED fitting method that rigorously treats statis- 
tical uncertainties so that spurious T—0 anti-correlations 
are avoided. To satisfy these conditions, we introduce a 
hierarchical Bayesian approach for fitting an ensemble of 
dust SEDs. 

Hierarchical! modeling is a statistical framework that 
was developed to ha ndle data analysis pr oblems with 
multiple stages (e.g.. iGelman fc Hifll 120071) . Hierarchi- 
cal models are preferred for complex data analysis prob- 
lems, as they are able to effectively handle multiple 
sources of uncertainty at all stages of the data anaysis. 
Sources of statistical uncertainties include random noise 
and calibration (correlated) errors, which can contribute 
both multiplicative and additive components. Hierarchi- 
cal models can account for correlated uncertainties and 
degeneracies between parameters, and may thus avoid 
any spurious correlations. Moreover, Bayesian methods 
calculate the probability distribution of the parameters 
given the measured data, so that the uncertainties re- 
turned by the inference are rigorous and well-defined. 
This ensures that all sources of measurement error are 
properly incorporated into the uncertainties in the es- 
timated parameters. This is in stark contrast to more 
traditional frequentist and approximate methods, such 
as the propagation of errors or the treatment of the 
best-fit values as the true values. Frequentist approaches 
can lead to biases and incorrect quantification of uncer- 
tainty in complex problems. Bayesian methods have been 
shown to avoid such biases for numerous problems in a 
variety of scientific disciplines, and, as we demonstrate in 
this work, can effectively handle the degeneracy between 
T and in SED fitting. 

This paper is organized as follows. In the next sec- 

2 Also called "multilevel" modeling. 
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tion, we describe our hierarchical Bayesian model, and 
contrast it with the x 2 fit- In Section 3, we test the 
Bayesian method on model data, and compare the re- 
sults to a. x 2 fit- We apply the Bayesian fit to observed 
fluxes from starforming core CB244 in Section 4. After 
a brief discussion on the interpretation of our results in 
Section 5, we provide a summary in Section 6. 

2. THE HIERARCHICAL STATISTICAL MODEL 

In this section, we present a hierarchical Bayesian tech- 
nique that simultaneously estimates the values of the col- 
umn density N, spectral index ft, and temperature T, as 
well as their joint distribution, directly from a set of ob- 
served fluxes. Although hierarchical Bayesian m odeling 
is be c oming more common in astrophysics (e.g.. iLoreda 
12004 lKellyH2007t iHogg et alJl2010t Mandel et alJl2011f ). 
it is not nearly as widely employed in most astrophysical 
fields as traditional single-level methods. In order to al- 
low the reader to become familiar with the hierarchical 
modeling approach, we begin by providing an overview 
of the method in Section 2.1. To guide readers familiar 
with single-level frequentist methods, we also contrast it 
with the minimized-x 2 fit, which is commonly employed 
for estimating N, ft, and T. A thorough description of 
the particular hierarchical Bayesian mode l we develop is 
subseq uently given in Sections 2.2 and 2.3. iGelman et al.l 
(2004) provides a clear overview of Bayesi an me t hods, in- 
cluding hierarchical models, and ICarroll et al. (2006) is 
a good reference on methods for dealing with measure- 
ment errors; both references also discuss Markov Chain 
Monte Carlo (MCMC) methods. 

The power of Bayesian analysis is reflected in its out- 
put, which for our problem is a probability distribution 
for each value of N, ft, and T for each pixel, as well as for 
the parameters defining the joint distibution of N, ft, and 
T, given the measured data0 The probability distribu- 
tion of the parameters given the observed data is called 
the posterior distribution, and provides a complete and 
straight-forward description of our uncertainty in the pa- 
rameters. This is contrasted with the output of the fre- 
quentist methods, such as x 2 or maximum-likelihood es- 
timates, as frequentist methods provide a point estimate 
of the parameters and sometimes an estimate of a con- 
fidence region. However, while Bayesian posteriors are 
exact, the frequentist analogues, i.e., confidence regions, 
are often difficult to estimate for complex problems, such 
as the one we are adressing. 

2.1. Basics of Hierarchical Modeling of Dust SEDs 

Traditionally, parameters for SED models have been 
estimated by minimizing x 2 ■ More recently, non- 
hierarchical Bayesian methods have been used to es- 
timate the paramet ers for individual pixels or sources 
(|Paradis et alj[2010h . These methods are appropriate if 
the goal is to estimate the SED parameters for a single 
pixel or source, as there is only one level to the data 
analysis problem: the measured flux values are gener- 
ated from the SED parameters for an individual pixel 
or source. In this case, one only needs a measurement 

3 Often in practice, Bayesian algorithms do not output the prob- 
ability distribution of the parameters directly, but rather output a 
set of random draws of the parameters from their posterior proba- 
bility distribution. 



model for the data, e.g., that the measured data are ob- 
tained by contaminating the SED at the observational 
wavelengths with Gaussian measurement noise. How- 
ever, for most scientific problems of interest, the data 
analysis problem has multiple stages, for which tradi- 
tional non-hierarchical methods, such as those based on 
X 2 , are not applicable and lead to biases. Within the 
context of the problem that we are addressing, namely 
the distribution of parameters for dust SEDs, there are 
at least two levels to the data analysis problem. First, 
there is the level corresponding to how the SED param- 
eters for individual pixels or sources are generated from 
the distribution of these parameters. Second, there is the 
level corresponding to how the measured fluxes are gen- 
erated from the SED parameters for individual pixels or 
sources. The point of hierarchical modeling is to model 
and fit both levels simultaneously. 

Under the traditional non-hierarchical approach, sta- 
tistical inference at the first (distribution) level would 
be performed using the best-fit results from the second 
(individual SED) level; i.e., the distribution of the SED 
parameters is estimated directly from the best-fit values 
for individual pixels or sources obtained by minimizing 
X 2 , effectively treating the best-fit values as if they were 
the true values. However, these best-fit values are esti- 
mated with error. The distribution of the estimates is 
the convolution of the distribution of the true values of 
the SED parameters with the error distribution of their 
estimates. Therefore, the distribution of the quantities 
estimated using non-hierarchical methods will always be 
a biased estimate of the distribution of the true values, or 
rather the distribution of the values that would have been 
obtained in the absence of measurement errors. Within 
the context of the ft-T relationship, this implies that the 
distribution of ft and T estimated using the x 2 -based es- 
timates will always be biased toward an anti-correlation, 
as the error distribution of ft and T is anti-correlated. 
This is a mathematical fact and is true for any value 
of the S/N, although distributions inferred from higher 
S/N data will not be as biased. The reason for this is 
because the traditional non-hierarchical methods do not 
effectively treat errors at all levels of the data analysis 
problem. 

Within the context of estimating SED parameters and 
their distribution, a hierarchical model is constructed by 
invoking a model for the distribution of SED parameters, 
as well as a model for the measured data. The model 
for the distribution of SED parameters has its own set 
of free parameters, and all that is assumed is the func- 
tional form. For example, one could assume that the 
distribution of SED parameters is a Gaussian distribu- 
tion, and then the free parameters would be the mean 
and covariance of the SED parameters. Both the model 
for the distribution of the SED parameters, and the SED 
parameters for individual pixels or sources, are fit simul- 
taneously. This approach is able to effectively handle un- 
certainty at all levels of the data analysis problem, and 
thus does not suffer from the biases that traditional non- 
hierarchical approaches do when estimating the distribu- 
tion of SED parameters. Moreover, because one assumes 
a model for the distribution of the SED parameters, and 
fits all of the pixels or sources simultaneously, one is able 
to obtain more precise estimates of the SED parameters 
for individual sources as all of the information is pooled 
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together. 

The difference between the traditional and hierachical 
modeling approaches may be more easily understood by 
using an analogy with astronomical imaging. As an ex- 
ample, consider the /3-T distribution. The distribution of 
the estimated values of j3 and T is the convolution of the 
distribution of the true values with their banana-shaped 
error distribution. Therefore, the error distribution of /3 
and T can be thought of as a 'PSF' which acts on the 
image of the distribution of the true values of the param- 
eters in the /3-T plane, causing the observed image of the 
parameters in the /3—T plane to be a blurred version of 
the true image. If all pixels or sources have the same 
value of /3 and T, then the image of the true values in 
the /3-T plane is that of a 'pointsource', and the image 
of the estimated values is just the banana-shaped 'PSF' 
(i.e., the error distribution). If there is any spread in the 
/3 and T values, then the true image in the /3-T plane is 
that of an 'extended source'. In order to effectively esti- 
mate the image of the true distribution of the parameters 
in the /3-T plane, it is necessary to deconvolve the image 
of the observed distribution with the error distribution. 
As we will show in § 12.31 the methods which employ 
X 2 -based estimates implicitly assume that the distribu- 
tion of the parameters in the /3-T plane is uniform over 
all possible values. Similarly, non-hierarchical Bayesian 
methods which treat the pixels or sources independently 
also employ this assumption. Because of the assumption 
of a uniform distribution, the non-hierarchical methods 
are not able to deconvolve the image of the estimates in 
the /3-T plane from the error distribution as both the im- 
age of the true and estimated distribution are expected 
to look the same under this assumption. Therefore, both 
the x" 2 an d non-hierarchical Bayesian methods do not re- 
cover the true image of the distribution and are biased. 
However, the hierarchical modeling framework is able to 
do the deconvolution because it also employs a model for 
the 'source' image, i.e., the image of the true distribution 
in the /3-T plane. Therefore, the hierarchical model sig- 
nificantly improves on the simpler methods, and provides 
more accurate estimates of the /3-T distribution. 

Another advantage of hierarchical modeling is that is it 
possible to divide the measurement process into multiple 
levels. This therefore enables us to treat both additive 
measurement noise and multiplicative calibration errors. 
Traditional methods usually assume a simple single-level 
measurement model, and dealing with multiple kinds of 
measurement error can be difficult for these approaches. 
However, it is easy to develop a hierarchical model which 
appropriately treats multiple levels or sources of mea- 
surement error. 

In this work, we develop a hierarchical model for the 
analysis of far-IR SEDs of astronomical dust. We per- 
form Bayesian inference on our model, as the uncertain- 
ties at all levels of the model are exact and straight- 
forward to interpret under the Bayesian approach. While 
it is also possible to perform frequentist inference using a 
hierarchical model, we do not do this as the uncertainties 
on the estimated quantities are harder to estimate and 
interpret. Moreover, hierarchical models lend themselves 
natually to MCMC algorithms, so Bayesian methods are 
also computationally straight-forward. 

2.2. The Measurement Model 



We model the dust SED as a modified black body, 
given by Equation ([1]). In addition, we multiply Equa- 
tion ([lj by a 'color-correction' factor, which corrects the 
measured flux values for the varying SED across the pho- 
tomctcric band, and, if available, is usually given in a 
tabulated form in the observer's manual for an instru- 
ment. The color-correction factor is given as a function 
of /3 and T. 

The SED fomulated by Equation ([T]) is applicable to 
optically thin dust emission along lines-of-sight with a 
single temperature. Consequently, sources with a sig- 
nificant amount of high-density material where dust be- 
comes optically thick, and/or with large temperature 
gradients, will exhibit SEDs which cannot be accurately 
described by Equation ([T]). We will modify our technique 
in future work to implement a more realistic model for 
the SED in such cases; however, we note that the goal 
of this paper is to develop a method that minimizes the 
effects of statistical error on the scientific conclusion, and 
therefore isolates the systematic errors, allowing for more 
direct investigatations into their effects. We address the 
issue of systematic errors from inappropriate application 
of Equation [T] further in Section [5] 

For j = 1, . . . , m observing bands, a map is measured 
for the source having i = 1, ...,n pixels. Denote the 
frequency of the j th band as i/j, and denote the mea- 
sured flux density for the i th pixel observed at Vj as Sij . 
Assuming Equation ([1]), the measured flux densities are 
assumed to be related to the actual values according to 
the measurement equation 



Oij — djbi/j (-/Vj, pi, ±i) -j- £ij. 



(2) 



Here, e^ is the random measurement error of the flux 
density due to noise at frequency Uj for the i th pixel, and 
Sj is the calibration error for the band corresponding 
to Vj. Note that the calibration error is assumed to be 
positive and the same for each pixel in the j th band. 

We assume that the noise is independent between pix- 
els in the same map and over different observing bands, 
and independent of the calibration uncertainties. In ad- 
dition, we assume that the calibration errors are indepen- 
dent over the bands. Our assumption that the noise is 
independent between bands and independent of the cal- 
ibration error is likely true. However, it is not true that 
the noise is independent between pixels in the same band. 
This is because all images are smoothed to the same res- 
olution, therefore correlating the noise in nearby pixels. 
While this can in theory be accounted for, it is difficult 
to include in our statistical model and even more difficult 
to develop an efficient MCMC sampler that accounts for 
this. Therefore, for simplicity we ignore the correlations 
in the noise among neighboring pixels. In addition, the 
calibration errors are often correlated, and therefore our 
assumption that they are independent may be incorrect. 
However, for simplicity, and because a quantified sum- 
mary of the correlations in Sj is typically not available, 
we assume the calibration uncertainties are independent. 

We employ a robust statistical model for the measure- 
ment errors and calibration uncertainties. This is to en- 
sure that our conclusions are not severely affected by 
our assumptions regarding these errors, as robust models 
allow for outliers and other deviations from the model- 
ing assumptions. We model the distributions of ey and 
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logSj, denoted as p(tij) and p(logSj), respectively, as 
having a Student's i-distribution with degrees of freedom 
rfmeas and d ca i, zero mean, and scale parameters <jjj and 
Tj . The distribution of the measurement noise is then 

Pfaj) = 



r((d n 



l)/2) 



-(dmoa a + l)/2 



t (dmcas/^ J \ J ( *mcas^^. 



(3) 



and the distribution of the calibration errors is 



p(log^) = 

r((4ai 



l)/2) 



(log^) 2 



-(d cal + l)/2 



T{d cai /2)Jd ca ,\KTJ 



^cal 



(4) 



Here, T(-) is the Gamma Function. In this work we as- 
sume (imoas = 3 and d ca i = 3. The scale parameters, err- 
and Tj, define the amplitude of the noise and calibra- 
tion error distributions, respectively. In the limit that 
d — >• co, Equations ((3]) and (j4]) converge to a Gaussian 
distribution with standard deviations a^j and Tj , respec- 
tively. However, unlike the Gaussian distribution, the 
i-distribution has more probability in the tails of the 
distribution, making it robust against outliers; indeed, 
a ^-distribution is commonly used when one req uires a 
robust statistical model fe.g. JGelman et aLl l2004'). 

The i-distribution is often appropriate when the mea- 
surement errors are assumed to be Gaussian, but their 
variance is only estimated and therefore unknown. In 
this case, the i-distribution also incorporates our uncer- 
tainty on the amplitude of the noise and calibration er- 
rors. For example, modeling e^ as following a Student's 
i-distribution corresponds to assuming that the noise is 
Gaussian with standard deviation <jjj, but we estimate 
&ij with aij . This is an appropriate model as the values 
of a^ are typically estimated from the maps themselves, 
e.g., by taking the dispersion in the pixel values in a 
region of the image relatively free of emission. The am- 
plitude of the calibration uncertainties, tj, are usually 
available from an observer's manual, but still have some 
uncertainty associated with them as they are often esti- 
mated by comparing a model fit to a calibration source. 

When we assume that the unknown of- follows a Scaled 

Inverse-x 2 distribution^ with d mcas degrees of freedom 
and scale parameter of , then tij marginally follows a 
Student's i-distribution with d moas degrees of freedom 
and scale parameter <r?-. In other words, Equation @ is 
equivalent to the following: 



14" 

^ 2 
Vij' 



'N(0,afj) 
Tnv-x 2 (d„ 



3:of,-)- 



(5) 
(6) 



Here, the notation x\y ~ p(x\y) means that given y, x is 
drawn from the conditional probability distribution of x 
given y, p(x\y). In addition, N(p, V) is a Gaussian dis- 
tribution with mean /i and variance V, and Inv-x 2 (/, s 2 ) 
is a scaled Inverse-x 2 distribution with / degrees of free- 
dom and scale parameter s. 

4 The Inverse-x 2 distribution is very similar to a x 2 distribu- 
tion and is commonly used to represent uncertainty in a variance 
parameter. 



In Figure [TJ we show the scaled Inverse-x 2 distribution 
with 3 degrees of freedom. In addition, we compare the 
i-distribution with 3 degrees of freedom to a Gaussian 
distribution. We chose a value of d m cas = deal = 3 for 
our measurement error model because it is the small- 
est value for the degrees of freedom that ensures that 
the Student's i-distribution has a finite mean and vari- 
ance. A value of 3 degrees of freedom implicitly assumes 
that ~ 6% of the data will be outliers by more than 3a. 
Alternatively, a value of 3 degrees of freedom can be in- 
terpreted as assuming a factor of ~ 2 uncertainty on the 
amplitude of the noise. While this may be somewhat 
excessive, we prefer this conservative approach to make 
our results robust against inaccuracies in our statisti- 
cal model, as well as unknown systematic effects. This 
robustness is illustrated in Figure [TJ which shows that 
outliers are expected in the i measurement error model 
as opposed to the Gaussian model, and the outliers are 
therefore down- weighted to ensure that they do not have 
an excessive influence on the results. The results are not 
strongly affected by changes in the degrees of freedom 
which are less than an order of magnitude, and values of 
d < 1 are typical for robust models fe.g-. IGelman et all 
l200l . 

2.3. The Distribution Model and Posterior Distribution 

In this section we derive the probability distribution 
for the individual values of JVj , Ti , /3i , and the parameters 
for their joint distribution, given a set of measured dust 
maps, assuming the measurement model in § 12.21 This is 
called the 'posterior distribution', and is the basis for our 
Bayesian approach. In order to simultaneously estimate 
the values of N, T, and j3 for each pixel, as well as their 
joint distribution, we introduce the additional step of 
assuming a parameteric form for the joint distribution 
for N, T, and (3. Denote the set of parameters^ for the 
distribution of N, T, and j3 for the source as 9, and denote 
the distribution of these quantities as p(N, T,/3\6). Then, 
the posterior distribution is 

p(N, T, f3, 9\S) ex p(9)p(N, T, /9|0)p(S|N, T, /3), (7) 

where N, T, and /3 are vectors containing the values of 
column density, temperature, and spectral index for the 
n pixels, S is an n x m matrix containing the measured 
flux densities, and p(9) is the prior distribution on 9. 
The term p(S|N, T,/3) is the likelihood function of the 
measured data. The quantity p(N,T, (3\9) defines the 
distribution of N, T, and (3, while the likelihood function 
p(S|N,T, 0) is defined by the measurement model for 
the maps, i.e., how N, T and /? generate the measured 
flux densities for each map. 

The existence of the calibration uncertainties makes 
calculation of Equation ([7]) difficult, as there is no ana- 
lytical form for the likelihood function of the measured 
data, p(S|N, T, f3). However, it is straightforward to cal- 
culate the likelihood function for the case of fixed cali- 

5 The notation 9 is used as a short-hand way of denoting the set 
of parameters for the distribution of N, T, and 0. Once we choose 
a particular distribution, then 8 contains the parameters for this 
distribution. At this point the derivation is still general so 9 is left 
unspecified. Later we will model the distribution of log N, log T, 
and /3 as a Student's t-distribution, so 9 = (/i, E), where fi is the 
mean vector and S is proportional to the covariance matrix. 
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Fig. 1. — The scaled Inverse-^ 2 distribution with d = 3 degrees of freedom (left), and the Student's t-distribution with d = 3 degrees of 
freedom (right, dashed line) compared with a Gaussian distribution (solid line). The Student's t-distribution is appropriate when the errors 
are assumed to be Gaussian, but the uncertainty on the variances of the errors can be modeled as a scaled Inverse-x 2 distribution. The 
additional uncertainty on the amplitude of the errors is reflected in the thicker tails of the t-distribution, making the t-model more robust 
against outliers. In this work we model both the measurement noise and calibration uncertainties as following a Student's t-distribution 
with d mcas = 3 and <i ca i = 3 degrees of freedom for the measurement noise and calibration uncertainties, respectively, as the amplitudes of 
both the the noise and calibration errors are only estimated; the t-model therefore makes our method robust against outliers. 

bration uncertainty, p(S|N, T, /?, S). For our t model for 
the measurement noise, the likelihood function at fixed 
5 is 



p(Sij\Ni,Ti,pi,5j) oc 



1 



1 



{S ij -5 j S Vi {N i ,^,T i )f 



-(dmea S + l)/2 



.(8) 



The actual measured data likelihood 
p(S|N,T,/3), is then obtained by 
p(S|N,T,/9,<y) over the distribution of 8 



function, 
averaging 

There- 
fore, in order to derive Equation (|7[) . we first start with 
the posterior distribution that we would have obtained 
if we treated the calibration uncertainties as additional 
parameters: 



consider this to be a reasonable choice, but as with the 
measurement model, our results are not strongly affected 
by the choice of d. Under the t-model, one can now use 
Equations ([8"T)-(fTT]) in combination with Equation (|4]) and 
a prior distribution p(8) to calculate the posterior distri- 
bution. 

In this work we assume a uniform prior on /i. For 
the prior on E, we use the so-called 'sepa ration strategy' 
prior developed by Barnard et al.l J2000). This prior is 
based on the decomposition 



SRS, 



(12) 



P (N,T,/3,e,log8\S) 



p(9) 



iJp(iV i ,T i ,ft|0) 



n 



Then, Equation ([7|) is obtained by integrating Equation 
((9|) over log 8j for j = 1, . . . , m. 

In this work we model the distribution of logiV, logT, 
and /3 as a multivariate Student's t-distribution with d = 
8 degrees of freedom: 



pQagN^logTiJilfrE 

1 



1 



d (x * 



|E|V2 

x i = (log JV 4 , logT,,/} 



Here, x T denotes the transpose of x, 9 — (/i, S), /i is the 
model mean value of (log N, log T, $) , and E is propor- 
tional to the model covariance matrix of (log N, log T, (3). 
Our reasons for using d = 8 are similar to the case of 
modeling the measurement errors; we want to use a dis- 
tribution that is robust against outlying values of N, T, 
or /?. A value of d = 8 implies that we expect about 
~ 1.6% of the data to be outliers by more than 3a. We 



where S is the diagonal matrix of standard deviations 
and R is the correlation matrix. The seperation strat- 
egy works by placing independent priors on the standard 
deviations and correlations. In this work we place a nor- 
mal prior on the elements of log S centered at the val- 
ues inferred from the x 2 estimates with variance equal 
" ]to 100; this is an extremely broad prior giving nearly 

p(log 8j ) TT p(Sij \Ni,Ti,/3i, 8j X ^qual weight to most reasonable values of the dispersio n 
f = i Jin log N, /?, and log T. Following Barnard et alJ pOOOTh 

we place an inverse- Wishart prior on R with four de- 
grees of freedom. Under this choice of prior, the marginal 
prior distributions for the correlations between log N,/3, 
and logT are uniform over [—1,1], reflecting our prior 
assumption that all values of the correlations are equally 
likely. 

The traditional non-hierarchical methods can be de- 
rived as a special case under our hierarchical Bayesian 
model. When the errors are assumed to be Gaussian 
(d — > oo), one ignores calibration uncertainties (Sj = 1), 
and when one assumes that log N, log T, and j3 are in- 
dependently and uniformly distributed over all possible 
values (E — > oo), the posterior distribution becomes 



M) T E -1 (xi-M) 



-(d+3)/2 



(10) 

(11) 



p(N,T,/3|S)c<nexp(-X 2 /2), 



where 



E 



£>ij £>Vj (-'Vj, Pi, ±i 



-1 2 



(13) 



(14) 
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If the estimated values of N, T, and (3 are constrained 
to fall within some range, say < (3 < 5, then Equation 
(|13[) is derived from the assumption of a uniform distri- 
bution over this range. Under Equation ()13|) the values 
of JVj, Tj, and j3i are independent in their posterior prob- 
ability distribution, and thus their best-fit values can be 
obtained independently for each pixel. The x 2 -based es- 
timators are those that maximize the posterior distribu- 
tion under the simplifying assumptions that the errors 
are Gaussian, that there are no calibration uncertain- 
ties, and that logiV, logT, and (3 are independently and 
uniformly distributed over all possible values. Similarly, 
non-hierarchical Bayesian methods based on Equation 
IT3l) also make these same assumptions (|Paradis et al.l 
l2010f h However, it is not true that all three of these 
assumptions hold. In particular, the assumption that 
log N, log T, and (3 are independently and uniformly dis- 
tributed over some range of values is a very strong as- 
sumption and leads to estimates of these quantities that 
are independent for each pixel, and thus the estimated 
quantities are overdispersed compared to their true val- 
ues. This overdispersion biases the inferred correlation 
between (3 a nd T toward an anti-c orrelation, such as that 
described bv lShettv et al.l (|2009a| ). Indeed, the very fact 
that an anti-correlation between j3 and T is inferred from 
such methods shows that their assumption of statisti- 
cal independence is violated, as statistically independent 
quantities must be uncorrelated. Therefore, such meth- 
ods are not self-consistent. Our hierarchical Bayesian ap- 
proach significantly ameliorates these problems by using 
a more realistic model for the distribution of log N, log T, 
and /3, for the measurement errors, and by including the 
calibration uncertainties. 

The posterior distribution, given by Equation 0, com- 
pletely summarizes our information on N, T, /3, /i, and S. 
We can then use Equation (0 to compute estimates of 
these parameters, and summarize our uncertainty on fi 
and S. However, there are a large number of parame- 
ters involved in Equation J7|, as each source is assumed 
to have its own value of N, T, and (3. Therefore, we 
will not be able to compute Equation Q on a grid, as 
there are 3n + 9 free parameters. Because of this, we em- 
ploy MCMC methods to obtain a set of random draws 
of N, T, /?, /i, and E from the posterior distribution. In 
addition, it is difficult, if not impossible, to analytically 
integrate Equation ([9]) over each Sj, which is necessary 
in order to compute Equation yj. For large values of n, 
it is computationally intensive to do the integration over 
log 5j numerically because of the product over the n data 
points. In order to make the computation of the posterior 
distribution tractable, we instead consider the unknown 
values of the calibration errors, 5j, to be additional pa- 
rameters and work directly with Equation (|9|) instead 
of Equation (0 . Under this strategy, the values of log Sj 
are additional parameters that are also estimated at each 
stage of our MCMC sampler. However, because the val- 
ues of the calibration errors are of no scientific interest 
(i.e., Sj is a 'nuisance' parameter), we simply discard the 
values of log Sj returned by our MCMC sampler, thus 
marginalizing over them. The result is a set of random 
draws of N, T, /3, and 9 from Equation ([7]). 

We construct our MCMC sampler using a combina- 
tion of Metropolis-Hastings updates and Gibbs sampling, 
where the Gibbs updates are used whenever possible. In 



addition, to remove degeneracies in some of the param- 
eters, and thus to increase the efficiency of our MCMC 
sampler, we e mploy an Ancillari ty-Sufficiency Interweav- 
ing Strategy (|Yu fc Mengll2011l ) with respect to the cal- 
ibration uncertainties. Implementing the interweaving 
strategy is necessary as we could not get our MCMC 
sampler to converge without implementing it. F urther 
techni cal details of our MCMC sampler are given in lKellvl 
(|2011|) . All statistical inference is then done using the 
random samples of N, T, /3, /i, and £ generated from our 
MCMC algorithm. 

Before concluding this section, we wish to make a 
comment on the sensitivity of the Bayesian fits to the 
assumed population model. The true distribution for 
log N, log T, and f3 is unlikely to follow a ^-distribution 
(or Gaussian, for that matter), but it is unlikely that er- 
rors due to this mismatch will have a significant effect on 
our results. This is because we are primarily interested 
in the moments of the data (e.g., the correlation between 
T and /?), and simple models such as the ^-distribution 
often enable us to adequately recover them. Moreover, 
most of our analysis relies on analyzing the values of 
Ni,Ti, and f3i returned by our MCMC sampler, which 
have their values 'corrected' relative to the x 2 -based es- 
timates under the assumption that they come from a 
common distribution, which we assume can be approxi- 
mated as a i-distribution. The degree to which the values 
of N, T, and f3 are corrected for any given pixel depends 
on the S/N of that pixel. When the S/N is high, the 
correction is small as the information on the values of 
Ni,Ti, and (3i is dominated by the information from the 
data. In this case, the estimated values of N^Ti, and 
f3i for that pixel are insensitive to the choice of model 
for the joint distribution and will be similar to those ob- 
tained by minimizing \ 2 ■ By focusing our analysis on the 
values of N, T, and (3 that are returned by our Bayesian 
approach, instead of the values of /x and S, we are less 
sensitive to differences between our model and the true 
distribution. In general, so long as the distribution of 
N, T, and (3 does not exhibit multiple modes seperated 
by large distances, the scientific conclusions should not 
be very sensitive to error due to mispecifying the statisti- 
cal model. We further explore this issue in the following 
section. 

3. TESTS OF HIERARCHICAL BAYESIAN METHOD ON 
SIMULATED DATA 

In order to illustrate the effectiveness of our Bayesian 
approach, and its improvement over traditional x 2_ based 
techniques, we apply our method to two simulated data 
sets. The first simulated data set is a source with mean 
T ~ 20 K, and the second is a source with mean T ~ 80 
K. We simulate values of the quantity C — NQ,kq from 
a mixture of two log-normal distributions: 

p(logC) = 7r0(logC|G\,« 2 ) + (1 - vr)0(logC|C 2 ,« 2 ). 

(15) 
Here, 0(logC|C, v 2 ) denotes a Gaussian distribution 
with mean C and variance v 2 as a function of log C._For 
boththe cooler and warmer source we set it — 0.4, Ci = 
7.6, Ci = 8.6, vi = 0.4, and t>2 = 0.15. These values were 
chosen to give column densities similar to that observed 
for CB 244 as observed by Herschel, which we analyze in 
§ |4j We simulate values of temperature from a mixture 
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of two Gamma distributions: 

p(T) = TrT(T\ki,Tf) + (1 - ir)T(T\k 2 ,T;). (16) 

Here, F(T|A:, T*) is a Gamma distribution with shape pa- 
rameter k and scale parameter T* as a function of T; note 
that the mean and standard deviation of the Gamma dis- 
tribution is T*k and T*v~h~, respectively. For the cooler 
simulated source we set k\ = 500, T* — 0.03, k 2 — 100, 
and T 2 * = 0.2. For the hotter simulated source we set 
fci = 500,7? = 0.15, k 2 = 1000, and T 2 * = 0.08. Finally, 
for both sources we simulated values of P at fixed temper- 
ature from a Gaussian distribution with mean A+B log T 
and standard deviation 0.1. We set B = 0.5, and A is 
chosen to give a mean value of (/3) = 2. In our simu- 
lations p and T are weakly and positively correlated by 
construction. 

We note that our model of a Student's ^-distribution 
for the joint distribution of log N, log T, and j3 is violated 
in these simulations. In fact, the distribution of log JV is 
bimodal, as is the distribution of logT for the cooler 
simulated source. These simulations will provide a good 
test of the sensitivity of our results to our assumption of 
a Student's ^-distribution for (log N, log T, (3) . 

We simulated flux values for n = 1000 data points at 
observational wavelengths A = 100, 160, 250, 350, and 500 
(Una, corresponding to those employed by the PACS and 
SPIRE instruments onboard Herschel. We multiplied the 
flux points in each band by a constant calibration error; 
the calibration errors were the same for every pixel in a 
given band, but differed over the five bands. The cal- 
ibration errors were drawn from a log-normal distribu- 
tion with standard deviations of 2.75%, 4.15%, 7%, 7%, 
and 7% for each band, respectively. These values were 
chosen to be equal to the official calibration uncertainties 
for PACS and SPIRE for a point source. To all fluxes, 
we also added noise drawn from Gaussian distributions 
with standard deviations <Tj — [2.2, 3.3, 5.2, 3.7, 2.2] x 
10~ 5 Jy arcsec -2 for each of the five bands, respectively. 
These values were chosen to be simila r to those observe d 
in the Herschel observation of CB244 (jStutz et al.l l2010), 
which we apply our method to in § 14.21 see § 14.11 for 
further details. The signal-to-noise distribution for the 
source with mean T ~ 20 K is similar to that of the Her- 
schel observations of CB244. For this source, most of the 
pixels have uncertainties dominated by the measurement 
noise, while the high S/N have uncertainties dominated 
by the calibration errors. However, most pixels for the 
source with mean T ~ 80 K have uncertainties domi- 
nated by the calibration errors. Therefore, the warmer 
source also provides an interesting test regarding the im- 
portance of accounting for the calibration uncertainties. 

We applied both our Bayesian method and a x 2 -based 
method to the two simulated data sets. For the x 2 fits, we 
constrained the best-fit temperature values to be between 
1 < T < 100 for the cooler source and 1 < T < 300 for 
the warmer source. We constrained the column density 
to be positive and (3 to lie within — 10 < f3 < 10 for both 
sources. For each data point, we chose five random inde- 
pendent initial guesses for N, T, and /3, and ran our x 2 
minimizer on each, keeping the value that minimized \ 2 
over the initial guesses. It is necessary to randomly ini- 
tialize the x 2 minimizing algorithm at multiple starting 
locations because the algorithm did not always converge, 



or there may have been local minima; this is mostly a 
problem for the low S/N data points. For the cooler 
source, we remove 104 data points for which the % 2 fits 
converged to a value on the boundary. For the warmer 
source, we only exclude 2 data points which converged 
to the boundary. 

In Figures[2]and[3]we show a random draw of the values 
of p and temperature for each pixel from their posterior 
probability distribution for the source with mean T ~ 20 
K and T ~ 80 K, respectively. We plot a random draw of 
j3 and T instead of the posterior median or mean because 
the random draw of j3 and T more accurately reflects the 
spread of the data in the /3-T plane. While the posterior 
median or mean values provide better estimates of j3 and 
T for any individual pixel, the distribtion of their values 
does not reflect as good of an estimate of the distribution 
of p and T, as they average over the intrinsic variability 
in these quantities that is present in every draw from the 
posterior distribution. In both Figures we also show the 
distribution of the best-fit values obtained from minimiz- 
ing x 2 ■ I n agreement with lShettv et all (|2009af ). we find 
that the distributions for both /3 and T of the x 2 -based 
estimates are wider than the true distributions. Fur- 
thermore, the distribution of the x 2 '-based estimates for 
P and T portrays an anti- correlation between these two 
quantities, despite the fact that these two quantities are 
constructed to be positively correlated in our simulations. 
This occurs because the x 2 -based estimate of the /3-T 
relationship is always biased toward an anti-correlation, 
as the errors in p and T estimated by minimizing x 2 are 
anti-correlated. For this simulation the errors on the % 2 - 
based estimates are large, and the magnitude of this bias 
is large enough to reverse the sign of the correlation. 

In Table Q] we compare the true values of the Spear- 
man's rank correlation between p and T for both simu- 
lations, denoted as p, with those obtained from the x 2 - 
based estimators and that obtained from our hierarchical 
Bayesian method. In addition, in Table[T]we compare the 
average values for T and p, and compare the values of 
the correlations and standard deviations of the best-fit 
parameters; for the Bayesian estimates we use the values 
of the correlations and standard deviations derived from 
the model covariance matrix to investigate how well the 
model covariance matrix recovers the true values. Unlike 
the x 2 -based estimators, the best-fit estimates derived 
using our hierarchical Bayesian method provide a more 
faithful reconstruction of the intrinsic distribution of p 
and T. In particular, the Bayesian estimates recover the 
true positive correlation between p and T . 

Our Bayesian method correctly recovers the true values 
of the means and correlation coefficients within the un- 
certainties. The only exception to this is the mean tem- 
perature for the warmer source, for which our Bayesian 
method ovestimates the true value by ss 4 K, a differ- 
ence of about 5%. This bias is most likely caused by 
our incorrect assumption of a Student's ^-distribution 
for the intrinsic distribution of (log N, logT, /?). How- 
ever, this bias is very small, and in general our Bayesian 
approach recovers the correct values, illustrating the ro- 
bustness of our method to inaccuracies in the assumed 
distribution. In contrast, the distribution derived from 
the x 2 -based estimates is biased and leads to incorrect 
conclusions. In addition to a spurious anti-correlation 
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Fig. 2. — Distribution of the true values for j3 and T (blue squares) for a simulated source with mean T ~ 20 K, compared with a random 
draw from the posterior distribution using our Bayesian hierarchical model (red triangles) and x 2 -based (green open circles) estimates. 
Also shown are the marginal distributions for temperature (top) and /3 (right) for the true values (blue), Bayesian estimates (red), and \ 2 
estimates (green). For clarity one data point with a x 2 -based estimate of T > 60 K is excluded. The Bayesian estimates more accurately 
recover the true distribution of /3 and T and their correlation, while the x 2 -based estimates incorrectly show an anti-correlation and exhibit 
some bias in estimating the average /3. 
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Fig. 3. — Same as Figure [2] but for a simulated source with mean T ~ 80 K. For clarity, two sources with x 2_ based estimates of 
T > 150 K are excluded. As with the cooler source, our hierarchical Bayesian estimates provide a better reconstruction of the true 



distribution, although they exhibit a small bias in the estimated average temperature, 
weak anti-correlation between /3 and T, and exhibit some bias in the average values of / 
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TABLE 1 

Comparison of Performance of x 2 -based Estimators with Hierarchical Bayesian Estimates on 

Simulated Data with T ~ 20 K and T ~ 80 K 







Cooler Source 




Warmer Source 




True 


x 2 


Baycs 


True 


x 2 


Bayes 


p a 


0.33 


-0.45 ± 0.03 


0.23 ± 0.08 


0.10 


-0.34 ± 0.03 


0.07 ± 0.02 


fl b 


2.0 


2.26 ± 0.03 


1.90 ± 0.07 


2.0 


2.16 ± 0.01 


1.96 ± 0.04 


T c 


18.0 


17.8 ± 0.17 


18.3 ± 0.3 


78.0 


67.8 ± 0.25 


81.9 ± 0.18 


Corr(logiV„3) d 


-0.02 


-0.61 ± 0.02 


0.00 ± 0.06 


-0.01 


-0.07 ± 0.03 


0.01 ± 0.04 


Corr(logT,,3) e 


0.32 


-0.69 ± 0.02 


0.38 ± 0.03 


0.09 


-0.62 ± 0.02 


0.08 ± 0.03 


Corr(logiV,logT) f 


-0.04 


0.21 ± 0.03 


-0.06 ± 0.03 


-0.01 


0.00 ± 0.03 


-0.01 ± 0.04 


(T(logN)e 


0.55 


0.59 ± 0.01 


0.58 ± 0.02 


0.56 


0.58 ± 0.01 


0.60 ± 0.01 


<r(P) h 


0.10 


0.90 ± 0.02 


0.111 ± 0.001 


0.10 


0.15 ± 0.01 


0.11 ± 0.01 


a(logT)' 


0.07 


0.109 ± 0.002 


0.083 ± 0.001 


0.02 


0.04 ± 0.001 


0.03 ± 0.001 



a The value of Spearman's rank correlation coefficient between j3 and T for the simulated sources. 

The average value of /3 for the simulated sources. 
c The average value of temperature for the simulated sources. 

The model value for the correlation between log TV and /3 
e The model value for the correlation between log T and /3 

The model value for the correlation between log AT and log T 
s The model value for the standard deviation in logiV. 

The model value for the standard deviation in 0. 
1 The model value for the standard deviation in log T. 
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between j3 and T, the x 2 -based estimates also exhibit 
biases in the estimated mean values of (3 and T. The 
X 2 -based method correctly recovers the mean tempera- 
ture for the cooler source, but overestimates the mean 
value of (3 by « 13%. The bias in the mean values is 
more noticeable for the warmer source, where the fluxes 
only sample the Rayleigh- Jeans regime. Here, the mean 
13 is overestimated by « 8% while the mean temperature 
is underestimated by sa 15%. The most likely source of 
this bias is the calibration errors, which the \ 2 method 
does not account for. In contrast, our Bayesian approach 
corrects for the calibration errors, and incorporates their 
contribution to the uncertainty in the parameters esti- 
mates. 

Our hierarchical Bayesian method also does a better 
job of recovering the true values of JV, and T for indi- 
vidual pixels. For the cooler source the median absolute 
values of the error in the Bayesian estimates for logJV, 
and T are 0.05, 0.11, and 0.59, respectively. For the 
X 2 estimates, these values are 0.08, 0.25, and 1.08 for 
logJV, /?, and T. The Bayesian posterior median esti- 
mates do a factor of sw 2 better that the x 2 estimates 
for the source with mean T ~ 20 K. For the source with 
mean T ~ 80 the Bayesian fits for individual pixels also 
did better, but this time the error in the \ 2 estimates 
are dominated by the bias caused by the unaccounted- 
for calibration errors. 

As a sanity check, we also show that our hierarchical 
Bayesian method recovers an anti-correlation when one 
exists. We simulate values of 0T, and JV in exactly the 
same manner as for the source with mean T ~ 20 above, 
but this time enforce an anti-correlation by using a value 
of B = —0.5. We then apply our hierarchical Bayesian 
method to the simulated data, and obtain estimates via 
X 2 -minimization. Figure |4] compares the results. For 
this simulation, the true value of the Spearman's rank 
correlation coefficient between f3 and T is p = —0.28. 
The value inferred from our Hierarchical Bayesian es- 
timate is p = —0.27 ± 0.02, while the value inferred 
from the x 2 -based estimates is p — —0.38 ± 0.03. Both 
the Bayesian and \ 2 method recover the anti-correlation. 
However, as expected the \ 2 estimates produces an arti- 
ficially stronger and more extended T— f3 anti-correlation 
(see Section 2.1). 

As a final test of our method we generate fluxes from an 
idealized simple model of a starless core. We construct 
a proje cted core similar to Model 1 from iShettv et al.l 
(|2009bf ). based on work bv lEvans etUI fl2001|). In this 
model, TV increases linearly toward the central regions, 
ranging from 2 x 10 21 to 1.25 x 10 22 cm~ 3 . Correspond- 
ingly, T decreases linearly, from 12 to 8 K. Lastly, j3 
decreases from 2.6 to 1.8. Our choice of the quantitative 
values for T, JV, and beta are motivated by the results 
we obtain from CB244 (see § I4.2|) . an d are generally con - 
sistent with the trends discussed by f Ev ans et al.l [2001). 
This model provides a test of how our method performs 
in the limiting scenario where the values of JV, T, and f3 
all lie on simple deterministic curves which do not have 
any intrinsic dispersion. We do not consider this model 
to be realistic, as a real cloud is unlikely to be spheri- 
cally symmetric with the line-of-sight averaged values of 
JV, T, and /3 lying along deterministic curves. However, 
it does provide a useful test of our method as this model 
may violate the assumption that the covariance matrix 
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Fig. 4. — Distribution of true values of /3 and T (blue square 
points) for a simulated data set with an intrinsic anti-correlation 
enforced, compared with the distribution of the x 2 -based estimates 
(green open circles) and a random draw from the posterior dis- 
tribution under our hierarchical Bayesian model (red triangles). 
Both methods are able to recover the anti-correlation, but the x 2 - 
based estimates produce a stronger and more extended T — /3 anti- 
correlation compared to the true distribution. 

of (log JV, log T, 0) is positive definite. 

For this model we simulate flux values as before, 
and execute both our hierarchical Bayesian method 
and the x 2 fit- The true correlation coefficients 
are Corr(log JV, /3) = -0.98, Corr(logJV,logT) = 
-0.97, and Corr(logT, /3) = 0.99. Our hierarchical 
Bayesian method returns values of Corr(log JV, 0) — 
-0.95 ±0.03, Corr(logJV,logT) = -0.98 ± 0.001, and 
Corr(logT,/3) = 0.88 ± 0.04, while the x 2 -based esti- 
mates infer Corr(log JV, 0) = -0.41, Corr(log JV, log T) = 
—0.31, and Corr(log T, 0) — —0.71. Our hierarchical 
Bayesian method again outperforms the % 2 approach for 
this test, although it does not perform as well as in pre- 
vious tests. In addition, as with the other tests the \ 2 
method incorrectly produces an anti-correlation between 
13 and T. 

For this test, convergence of our MCMC sampler 
proves to be extremely slow due to the strong depen- 
dencies among the values of JV, and T and E. Because 
the correlations for this test are \r\ sw 1, N,j3, and T 
for each pixel are very precisely determined from \x and 
E. However, given JV, 0, and T for each pixel, p and E 
are very precisely determined. This dependency results 
in very slow convergence of our MCMC sampler, and the 
decreased performance is likely due to this lack of conver- 
gence. While this does not affect the qualitative results, 
convergence should be carefully monitored when there is 
evidence that the correlations from E converge to 1 or 
-1. Future additions to our MCMC sampler will improve 
convergence for cases where the correlations among JV, 
and T are extremely tight. 

4. APPLICATION: DUST IN STAR FORMING BOK 
GLOBULE CB244 

In this section we discuss the application of our Hi- 
erarchical Bayesian method to the Bok globule CB244. 
The purpose of this application is to illustrate how our 
method performs on real data, to compare the results 
from our method with those obtained from \ 2 minimiza- 
tion, and to illustrate what type of conclusions might be 
derived from our method as compared to traditional non- 
hierarchical methods. The application of our method to 
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CB244 is valuable as an illustration because real data is 
often subject to further complications beyond the simple 
idealized simulations that we have performed, and can 
include a number of systematics that are not captured 
in our statistical model. A full treatment of all data sys- 
tematics is beyond the scope of this methodology-focused 
article, but will be addressed in future work when we ap- 
ply our method to similar astronomical sources. 

4.1. Observations 

CB244 is an isolated, low-mass, star-forming molec- 
ular cloud located at a distance of ~ 200 pc. It con- 
tains two Herschel emission peaks, one associated with a 
Class protostar and one associated with a starless core 
(jStutz et alJl2010T ) . Because of the relative simple geom- 
etry of such sources, Bok globules are excellent targets 
to study the processes taking place in the dense ISM, 
free from complications arrising in more clustered en- 
vironments. The Herschel CB244 data were acquired 
as part of the Guaranteed Time Key Programme "Ear- 
liest Phases of Star-for mation" (EPoS; P.I. O. Krause, 
e.g.,;lBeuther et al.l r 2010: H enning et aLll201CftlLiiiz et al.l 
2010; Stutz ct al. 2010) as part of the Science Demon- 
stration Program. The sources in this program were se- 
lected to be in relatively isolated regions in order to min- 
imize the effects from uncertain back ground subtraction . 
These data were first presented in (jStutz et al.l I2010J) . 
The submm SCUBA 870 /im and th e IRAM 1.3 mm 
groun d-based data were presented in lLaunhardt et all 
(2010). For our purposes of temperature mapping, the 
data reduction has been updated; here we present a brief 
outline of the Herschel processing. See Launhardt et al., 
(2012, in prep) fo r further detai ls. 

The Herschel (jPilbratt et al.l 12010( 1 data for CB244 
were observed with th e Photodetector Arra y Camera and 
Spectrometer (PACS: IPogritsch et al.ll2010H and Spectral 
and P hotometric Imaging Receiver (SPIRE; iGriffin et al.l 
120101) The PACS 100 /zm and 160 /im observations were 
carried out on December 30, 2009. Two orthogonal scan 
maps were acquired using a scan speed of 20"/s with 
scan leg lengths of 9'. The AOR ID's for these obser- 
vations are 134218869(4,5). The two wavelengths were 
processed in an identical fashion. The level 1 data were 
processed using HIPE v. 6.0.1196. Final level 2 maps 
were produced using Scanamorphos (Roussel et al., 2011, 
in prep.), using the "galactic" option. These data were 
processed including the non-zero-acceleration telescope 
turn-around data. The SPIRE 250, 350, and 500 /mi 
observations were obtained on October 20, 2009. Two 
9' scan legs were obtained at the nominal scan speed of 
30"/s. The AOR ID for these data is 1342199366. These 
data were processed up to level 1 with HIPE 5.0.1892. 
The level 2 maps were processed using Scanamorphos 
v.9 (Roussel et al., 2011, in prep.). 

The calibrated dust emission maps were then processed 
in an identical fashion to the data presented in Laun- 
hardt et al., (2012, in prep). We briefly summarize the 
steps used here. First, the data are re-zeroed using a 
m ethod simila r to th at applied to Spitzer MIPS images 
in lStutz et al.l (|2009f ). In order to caculate the DC- level 
offset from the data we identified a 4' x 4' emission-free 
region that appears 'dark' in the Herschel maps. In ad- 
dition, we require that this region is in or near a region 
which is relatively free from 12 CO(2-l) emission (Laun- 



hardt et al. 2012, in prep). The same spatial region was 
used for all five Herschel maps. For each band, we then 
calculate the representative flux level in the region by 
implementing an iterative Gaussian function fitting and 
sigma-clipping scheme to the pixel value distribution at 
each wavelength. The mean value of the best-fit Gaus- 
sian function is subtracted from each image, while the 
standard deviation is used to estimate the noise levels. 
If the noise in the image is Gaussian, then the distri- 
bution of measured flux values for pixels with true flux 
near the DC level will also be Gaussian. In this case, 
the mean of the best-fit Gaussian function provides an 
estimate of the DC level, while the standard deviation 
of the best-fit Gaussian function provides an estimate of 
the noise amplitude. The estimated DC levels and noise 
levels for each Herschel map are reported in Table O 

A pointing correction is also applied to the PACS im- 
ages relative the the MIPS 24 /im data of the same field; 
in the case of CB244 this correction is small (of order 
1") compared to the SPIRE 500 tun beam. We apply 
the recent Herschel calibration correction factors to the 
data. The 100 to 500 /im data are then converted to 
common units of Jy/" . The data (including the sub- 
mm ground-based observations) are then convolved to 
the limitin g beam, in this case t he SPIRE 500 /im beam. 
We use the lAniano et al.l (|2011l ) circularized convolution 
kernels for the 100 to 350 /im data, and a Gaussian beam 
approximation for the ground-based sub-mm data. Fi- 
nally, the data are re-gridded to a common coordinate 
system and a pixel scale of 10". 

4.2. Fitting results 

In applying our Bayesian method to the CB244 
dataset, we limit our analysis to those pixels containing 
fluxes in at least each of the five Herschel bands, since 
the SPIRE images have more coverage than the PACS 
images. In addition, to minimize the impact of uncer- 
tainties in the estimated DC-level offset, we limit our 
analysis to those pixels having S/N > 2 as averaged over 
the five Herschel bands, for which the formal statistical 
error in the DC-level offset is negligible. The coverage 
of pixels that we analyze using our hierarchical Bayesian 
method is shown on the 500 /im map in Figure 

We conservatively assume calibration uncertainties of 
15% for each of the Herschel bands, and calibration un- 
certainties of 30% and 20% for the 870 /zm and 1.3 mm 
bands, respectively. The values we adopt are larger than 
the official Herschel Science Center (HSC) values, and 
our reasons are as follows. 

For the PACS instrument, the point-source calibration 
uncertainty is 3% and 5% at 100 /im and 160 /im, respec- 
tively. However, we have used versions of the Launhardt 
et al. (2012, in prep) data reduced with Scanamorphos, a 
pipeline that is better suited for the analysis of extended 
emssion compared to high-pass filtering. High-pass filter- 
ing is another common Herschel image analysis technique 
that removes unknown levels of extended emission and is 
thus better suited for point-source analysis. We note that 
the high-pass filtering technique removes 1/ / noise more 
efficiently than other map processing techniques; there- 
fore our Scanamophos maps may have elevated noise lev- 
els by comparison. 

Because we are most interested in investigating the ex- 
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TABLE 2 
Estimated DC-levels, Noise Amplitudes, and Calibration Errors for Herschel maps of 

CB244 



lOGyrni 



160/jm 



25GVtm 



350/im 



50Gym 



DC-leveP 

Noise Amplitude 13 

Estimated Calibration Errors 



1298 


510.0 


298.9 


169.6 


77.19 


39.52 


75.00 


26.54 


13.44 


7.206 


0.93 ±0.14 


0.90 ±0.09 


1.07 ±0.10 


1.18 ±0.14 


1.26 ±0.19 



a DC-lcvcl in units of fiJy/ , estimated according to the procedure described in § 14. 1| 

Standard deviation in the additive noise in units of fijy/" , estimated according to the procedure 
described in § |4.1| 

c The posterior median and standard deviation for the calibration errors. The calibration errors arc a 
priori assumed to be log-normally distributed with a geometric mean of unity and an uncertainty of 15%. 
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Fig. 5. — Coverage of the CB244 pixels for which we analyzed 
using our hierarchical Bayesian method, compared with the SPIRE 
500 (im map. We analyzed pixels that had coverage in all five Her- 
schel bands and a mean S/N over the Herschel bands of (S/N) > 2. 
In addition, we omitted the protostar in the center of the image 
from our analysis. The grey circle in the lower left corner illustrates 
the size of the SPIRE 500 /im beam. 

tended emission, we conclude that reduction techniques 
that remove extended emission levels, namely high-pass 
filtering, are not robust for out scientific goals even when 
they deliver data products with reduced l/f noise. The 
main remaining uncertainties are most likely caused by 
beam convolution effects (e.g., imperfect kernels) and 
possibly color corrections. These uncertainties are very 
hard to quantify. Therefore our strategy is to adopt an 
inflated calibration uncertainty, meant to represent mul- 
tiple independent source of uncertainty: extended emis- 
sion calibration uncertainties, the uncertainties intro- 
duced by beam convolution, l/f noise, and other uniden- 
tified effects. Thus the final calibration uncertainties we 
use are 15% and are conservative compared to the HSC 
recomended point-source calibration uncertainties. 

For SPIRE the final HSC recommended calibration un- 
certainty is 7%. However, the calibration for extended 
sources is performed by multiplying the calibration for 
point sources by a correction factor (see Section 5.2.8 of 
the SPIRE Observers' Manua0). No additional extended 
source calibration uncertainty is discussed to our knowl- 

6 http : //herschel . esac . esa . int/Docs/SPIRE/html/spire_om . html 



edge, and thus we also assume 15% uncertainty on the 
calibration for the SPIRE maps. These values are con- 
servative compared to the HSC recomended values for 
calibration uncertainties. 

We employed the color corrections reported in the Her- 
schel/ 'SPIRE and Herschel/PACS observer's manuals. 
In general, the col or corrections are small. Following 
iSchnee et al.l (J2010D . we ignore the color correction for 
the A = 870 /im data, and impose a color correction to 
the 1.3 mm data by modifying the effective wavelength 
to be A = 1.1 mm. 

Observed fluxes from the centrally-heated protostel- 
lar region are affected by line-of-sight temperature vari- 
ations, and possibly even optically thick dust. Due to 
these systematic uncertainties, the estimated protostel- 
lar values of N, ft, and T through Eq. [1] are likely to 
be highly erroneous, and may even introduce biases into 
other inferred quantities, such as the mean and covari- 
ance of log N(H), log T, and ft of the whole sample, as 
well as the calibration uncertainties. Accordingly, in our 
analysis we omit the pixels corresponding to the protot- 
star. 

We also estimate N(H),T, and ft based on minimizing 
X 2 . Because the x 2 -based estimates are unstable at low 
S/N , we limit our analysis to those pixels for which the 
average S/N over the Herschel bands is (S/N) > 5. 

The derived relationship between ft and temperature 
for both the Bayesian and x 2 -based estimates are shown 
in Figure HJ The x 2 -based estimates suggest a strong 
anti-correlation between ft and T, as expected. However, 
the Bayesian analysis finds that ft weakly increases as T 
increases, which is the opposite trend compared to the 
X 2 -based estimates. The Spearman's rank correlation for 
the % 2 -based estimates is p = —0.68 ±0.01, while for the 
Bayesian estimates it is p = 0.33 ± 0.04. The correlation 
between ft and T is rather weak, and while ft tends to 
increase with T in the mean, there is a large scatter in ft 
at a given temperature. 

In Figure [7] we show the SED for the pixel in the 
prestellar core with highest N(H), a pixel with (S/N) 
similar to the median value, and a pixel with (S/N) = 5. 
For the prestellar core we find T = 11.6±0.2 K, ft = 
1.88 ±0.13, and\ogN(H) = 22.65 ±0.01 cm" 2 . For the 
prestellar core, the x 2 estimates are T — 10.97 ± 0.14 K, 
ft = 1.61 ±0.05, and log N(H) = 22.35±0.09 cm~ 2 . The 
SEDs are compared with the range of greybody models 
that contain 95% of the posterior probability. In ad- 
dition, we show the SED derived from the x 2 estimates. 
Note that the Bayesian greybody SED models defined by 
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Fig. 6. — The left panel shows the distribution of j3 and T for CB244 from minimizing \ 2 (black open circles) and the random draw 
from the posterior distribution under our hierarchical Bayesian model (red triangles). The inset provides a close-up of the density of the 
distribution, while the right panel shows a close-up of the hierarchical Bayesian values. As expected, the x 2_ based estimates display an 
anti-correlation. However, the Bayesian estimates show a weak positive correlation, and there is a large range in /3 at fixed T. 



the red region do not include the contribution from the 
calibration uncertainties, and thus it is not appropriate 
to compare them directly to the data. 

In order to assess the quality of fit, we compare the 
measured fluxes with those predicted by our Bayesian 
method for each of the three SEDs shown in Figure [7l 
This i s called a "posterior predictive check" ()Rubinlll98l[ 
Il984t iGelman. Mens, fc Sterr] [19961 . and is commonly 
used to assess the goodness-of-fit of a Bayesian model. 
For each random draw from our MCMC sampler, we 
simulate a flux value at each value of v, incorporating 
the calibration error and measurement noise. Through 
this process we build up a distribution of predicted fluxes 
incorporating our uncertainty in the derived parameters. 
The advantage of this approach is that we compare the 
actual measured fluxes (which are considered fixed and 
known) to a distribution of predicted fluxes, instead of 
simply comparing the measured fluxes at each wave- 
length to a single best-fit SED. Because the predicted 
fluxes also have the calibration errors and noise folded 
in, they are the appropriate quantity to compare to the 
measured fluxes to test the quality of the fit, and not the 
model greybody SEDs. Figure [7] also compares the mea- 
sured fluxes with the ranges containing 95% of the values 
predicted from our Bayesian approach. In all cases the 
measured values fall within this range, showing that the 
Bayesian results are consistent with the measured data, 
and therefore provide an acceptable fit. 

We also checked the derived calibration errors and their 
uncertainties, in order to ensure that the derived calibra- 
tions are consistent with those obtained from the data 
reduction; i.e., that the calibration errors are consistent 
with Sj — 1. The estimated calibration errors and their 
uncertainties are also listed in Table [2j The estimated 
calibration errors are consistent with unity, implying that 
we do not find any evidence for significant deviations 
from the calibrations performed in the data reduction 
which might be indicative of data systematics or model 
mis-specification. Moreover, the posterior uncertainties 
in the derived calibration errors are similar to the a pri- 
ori assumed values of 15%, indicating that essentially 
all of the information from the calibration errors comes 
from the prior that we have placed on them. Because of 



this, the fact that our method incorporates the calibra- 
tion errors should not be interpreted as a recalibration of 
the data. Rather, we have included the calibration errors 
as nuisance parameters which are identified as an addi- 
tional source of measurement error. Indeed, it is not the 
absolute calibration which is included in our statistical 
model, but rather the error in the calibration. The pos- 
terior probability distributions that we obtain from our 
MCMC method average over the unknown calibration er- 
rors, thus ensuring that uncertainties in the calibration 
are also reflected in the uncertainties in the derived grey- 
body parameters, as well as reflected in the esti mate d 
means and correlations (also see the discussion in § I4.3[) . 

As an additional test, we perform cross-validation to 
compare the hierarchical Bayesian estimates with the x 2 - 
based ones. For this test, we randomly remove 10% of 
our photometry, and then refit the remaining 90% of the 
CB244 data. The resulting estimates of N(H), /3, and T 
were then used to predict the flux values for the 10% of 
the data that were left out. If the hierarchical Bayesian 
estimates provide a better description of the SED, then 
they should do a better job of predicting the data that 
is omitted from the fit. As a measure of the error in the 
flux, we use the absolute value of the difference between 
the measured flux and the model flux, divided by the 
standard deviation of the noise in that band, <jj. For our 
Bayesian estimates, the median of this error is 0.55, while 
the median for the % 2 estimates is 0.90. The hierarchical 
Bayesian estimates did a factor of ~ 2 better than the 
X 2 estimates in predicting the flux values for data that 
is omitted from the fit. This result suggests that the 
Bayesian estimates provide a better description of the 
SED, and are therefore more reliable than the x 2 ones. 

The temperature and /3 maps of CB244 are shown in 
Figured! along with contours of constant column density; 
all maps were derived from the posterior median values. 
The temperature tends to decrease toward the center of 
the prestellar core, while the column density tends to 
increase toward the center. The /3-map illustrates that 
the values of j5 trace the column density values very well, 
with /3 decreasing toward the central, more dense regions. 
The estimated /3 values become noisy near the central 
dense region of the core, with more drastic spatial vari- 
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Fig. 7. — Measured fluxes (blue stars) for the pixel with the highest estimated column density in the prestellar core (right), a pixel with 
average Herschel S/N similar to the median value (center), and the pixel with average Herschel S/N = 5 (left), which defines the lower 
limit of our S/N cut for the x 2 estimates. The 100/mi flux measurement is missing from the left panel because its value is negative. The 
best-fit greybody SEDs derived from the \ 2 estimates are shown with a dashed black line, while the red regions contain 95% of the posterior 
probability for the greybody SEDs derived from our hierarchical Bayesian method. The measured fluxes are compared with the values 
that are predicted from our Bayesian model (black circles), with the error bars containing 95% of the posterior probability on the measued 
SED. The fluxes and their error bars predicted from our Bayesian model differ from the model greybody SEDs in that they also include 
the effects of the calibration error and noise, and thus it is the green circles that should be compared with the measured data and not the 
red region. The actual measured values of the flux fall within the range expected from our Bayesian model, and therefore our model is 
consistent with the measured data. 



ations. It is unclear why this is the case, although it 
may be related to the breakdown of the assumption of 
optically thin isothermal dust that underpins Equation 

Figure [9] shows the distribution of the CB244 data 
points in the N(H)-f3 plane, using both the Bayesian 
and % 2 -based estimates. For the Bayesian estimates we 
show a random draw from the posterior distribution to 
more faithfully represent the intrinsic scatter in j3 at a 
fixed value of N(H). As expected from the (3 map shown 
in Figure [8j our Bayesian estimates show a tight anti- 
correlation between column density and /3. The anti- 
correlation between j3 and N(H) is also seen with the 
X 2 estimates, but the larger errors in the x 2_ based es- 
timates add considerable artificial scatter to the values 
of ft at fixed N(H), making the correlation appear less 
significant. The correlation can be parameterized as 



(3 = (2.18±0.18)-(0.27±0.01)log 



N(H) 



10 2 



(17) 



The scatter in f3 at fixed N(H) is estimated to have a 
dispersion of ap\ N = 0.040 ± 0.003. This value of <Jp\N 
argues against a constant value of /3 at fixed column den- 
sity. The anti-correlation between (3 and column density 
is highly significant and is observed even if we limit our 
analysis to the highest S/N data. 

The (3 — N(H) anti-correlation obtained from our 
Bayesian method is much tighter than that estimated 
using the \ 2 estimates. However, we caution that it is 
unclear if the small scatter in f3 extends over the entire 
range in column density probed. Because S/N is strongly 
correlated with column density, the pixels with low esti- 
mated column density also have low S/N. For the lower 
S/N pixels, the model for the distribution of /3,N(H), 
and T becomes more informative, and thus the distribu- 
tion of the low N(H) estimates strongly depends on ex- 



trapolation from the distribution of the high N(H) esti- 
mates. Therefore, the distribution of j3 at fixed N(H) at 
low N(H) is primarily estimated by extrapolation from 
the distribution of at fixed N(H) at high N(H). Our 
simple Student's t model fixes the standard deviation of 
the scatter in (3 at fixed N(H) to be constant. Because 
the scatter in (3 at fixed N(H) is small for high N(H), 
where the S/N is high, the Student's t-model therefore 
extrapolates the scatter in (3 at fixed N(H) at low N(H) 
to also be small. 

The increasing influence of extrapolation on our re- 
sults can be seen more clearly in distribution of T at 
fixed N(H), where the dispersion in estimated T at 
fixed N(H) increases down to N(H) ~ 10 21 cm~ 2 , an 
then becomes constant. This therefore suggests that 
the estimated values for pixels N(H) < 10 21 cm~ 2 are 
strongly influenced by extrapolation from the pixels with 
N(H) > 10 21 cm -2 . If we had used a more flexible 
model, such as a mixture of Gaussian functions, then 
the high S/N data would not be as informative about the 
low S/N data and the scatter in j3 at fixed N(H) for the 
low N(H) data would be poorly constrained. Because of 
this, we cannot conclude that the tight anti-correlation 
derived from our hierarchical Bayesian approach persists 
across the entire range of column density probed in this 
analysis, although we do find evidence that it is real at 
N(H) > 10 21 cm~ 2 . Future work will incorporate more 
flexible models for the distribution of f3,N(H), and T. 
However, we also note that the weak positive correlation 
between /3 and T that we observe persists if we limit 
ourselves to only those pixels with N(H) > 10 21 cm~ 2 . 

Figure[9]also shows the distribution of the CB244 data 
points in the N(H)-T plane, using both the Bayesian and 
X 2 -based estimates. As with the N(H)-/3 distribution, 
we use a random draw from the posterior distribution 
for the Bayesian estimates. An anti-correlation between 
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Fig. 8. — Temperature (left) and (right) map for CB244 derived from the Bayesian estimates. The red hashed region in the center of the 
images corresponds to the protostar, which we exclude from our analysis, and the grey circle in the lower left corner illustrates the size of the 
SPIRE 500 (im beam. Overplotted are contours of constant column density, corresponding to N(H) = 10 20 , 5 X 10 20 , 10 21 , 5 X 10 21 , 10 22 , 
and 2 X 10 22 cm -2 . The coolest and most dense region corresponds to the prestellar core, with the temperature decreasing toward its 
center. The /3-map traces the column density map very well, with the values of /3 decreasing toward the central, more dense regions. 
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Fig. 9. — Dependence of /3 (right) and T (left) on column density, N(H). There is a tight and highly significant anti-correlation 
between column density and /3, and there is a weak anti-correlation between column density and temperature. These anti-correlations 
may be evidence for the growth of dust grains in more dense, cooler environments. In both cases the trends are not as obvious for the 
X 2 -based estimates because large uncertainties artificially broaden the inferred distributions. The estimated values of /3 and T based on 
our Hierarchical Bayesian method likely become strongly influenced by extrapolation from high S/N data points at N(H) < 10 21 cm -2 
(shaded region), so we caution that the estimated trends at low N(H) may not be robust. 



column density and temperature is apparent, although 
it is not as tight as the anti-correlation between N(H) 
and p. The anti-correlation between column density and 
temperature is also manifested in the temperature map of 
CB244 (Fig. [5]) , where the temperature decreases toward 
the central denser regions. However, comparison with 
the /3-map in Figure [3] shows that values of (3 ~ 2.2 are 
found both in the warmer region to the east of the core, 
and in the cooler region to the west of the core. These 
results suggest that density, and not temperature, is the 
primary driver behind variations in /?. 

Our analysis also enables us to investigate how (3 de- 
pends on temperature at fixed column density. Although 
we have found that /3 and T are weakly positively cor- 
related in CB244, this is obtained by averaging over the 



distribution oiN(H) for CB244. Therefore, it is not nec- 
essarily true that (3 and T are correlated at fixed N(H). 
We quantify the relationship between j3 and T at fixed 
N(H) by performing a linear regression of (3 simulta- 
neously on both logT and log N(H), which we derive 
from our estimated covariance matrix for j3, log T, and 
log N(H): 



(3 = (2.29 ± 0.18) - (0.29 ± 0.01) log 



10 21 cm- 2 



(0.81 ±0.19) log 



T 
10 K 



(18) 



Interestingly, our results imply that for CB244 (3 and 
T are anti-correlated at fixed column density. However, 
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when averaging over the distribution of column densities 
/3 and T display a weak positive correlation in CB244. 
The scatter in /? at fixed column density and temper- 
ature is (Jf3\N.T — 0.036 ± 0.002, which is only sa 10% 
smaller than the scatter in /3 at fixed column density. 
These results confirm that the variations in /3 are pri- 
marily accounted for by variations in N(H), and that 
temperature is only a minor secondary driver to varia- 
tions in p. 

4.3. Assessing the Impact of Data Systematics 

As stated above, we limit our analysis to those pix- 
els with (S/N) > 2 over the five Herschel bands, with 
the goal of minimizing the effect of systematic error in 
the zero-level of the maps, which would affect the faint 
regions the most. However, it is possible that our re- 
sults are still affected by systematics regarding the es- 
timated zero-level. Such systematics may result from 
uncertainties in both the astrophysical and instrumen- 
tal background levels. In both cases this would result in 
a spatially-correlated systematic error in all of the flux 
values across the map. To assess the impact of errors 
in the estimated zero-level, we perform a few additional 
checks regarding systematics in the DC-levels. However, 
we note that in order to fully realize the impact of data 
systematics on the results for a particular source more 
rigorous simulations should be performed. As this is be- 
yond the scope of this paper, the tests we perform here 
are meant to be illustrative of the impact of data system- 
atics on the Hierarchical Bayesian results, and how one 
might address them in practice. Moreover, we also note 
that unaccounted systematics also affect the x 2_ based 
results in a similar, if not the same, manner. Thus our 
result that the Hierarchical Bayesian method leads to 
opposite conclusions with respect to the x 2 estimates 
regarding the correlations among SED parameters for 
CB244 is robust against these systematics. 

We first assess the impact of systematic uncertainty in 
the zero-levels of the data from ground based bolome- 
ters (i.e., SCUBA and MAMBO). For the ground-based 
data it is unlikely that the uncertainty in the zero-level 
is driven by uncertainty in the background emission, as 
the subtraction of atmospheric emission is taken care 
of in the measurement procedure. The MAMBO and 
SCUBA data are chopped at high frequency, and the re- 
stored dual beam maps automatically subtract off all at- 
mospheric and astrophysical extended background. The 
information on the atmospheric emission and extended 
background are no longer in the data. 

In order to assess the robustness of our derived corre- 
lations to systematics with respect to the ground-based 
data, we redid our anlysis using only the five Herschel 
maps. Using only the Herschel data, the Spearman's 
rank correlation coefficient for the J3—T relationship was 
p = 0.37±0.04 and the slope of the Nh~P anti-correlation 
was —0.19 ±0.01. The f3-T correlation is essentially un- 
affected by the removal of the ground-based data, while 
the Nh~/3 anti-correlation is reduced in magnitude but 
still present. Our results are thus qualitatively robust 
against the zerolevel offsets in data from the groundbased 
bolometers, which frequently result from the spatial fil- 
tering techniques app lied during data reduction (e.g., 
iKauffmann et al.ll2008D . 

As stated earlier, CB244 is part of a larger sample of 



Bok globules chosen to lie in relatively isolated regions. 
These sources were chosen in this manner so as to ensure 
that they have exceptionally low background emission, 
and thus uncertainties in the astrophysical background 
emission should be minimal. The uncertainty in the zero- 
level for the Herschel maps should be driven by the un- 
certainty in the instrumental background. To assess the 
impact of mispecifying the zero-levels of the Herschel 
data, we refit the CB244 data using only the Herschel 
bands after adding Gaussian noise to the logarithm of 
each of the five estimated DC-levels. The standard devi- 
ation in the log-normal noise was 5%. This value is much 
larger than the formal statistical uncertainty on the DC- 
level as estimated according to the procedure described 
in § 14.11 (i.e., on the mean value of the best-fit Gaussian 
function), but it should give us insight into how uncer- 
tainty in the DC-level affects the results. Note that each 
pixel in a map is assumed to have the same DC-level, and 
thus the perturbed DC-levels introduce the same offset 
to each of the pixels in a given map. 

We did not find any significant difference to the correla- 
tions obtained when fitting the maps with the perturbed 
DC levels; however, there is a large difference in the in- 
ferred mean values of /? and T. Using the perturbed data, 
our M_CMC sampler inferred an average value for /3 and 
T of p = -0.26 ± 0.32 and T = 27.3 ± 0.7. This value 
of (3 is inconsistent with a value of /3 ~ 2 inferred from 
our CB244 data and from earlier studies. Moreover, for 
the original CB244 data, the best-fit values of the cali- 
bration errors, <5, were on average 0.7tr away from unity, 
which is consistent with our prior assumption that the 
calibration should be correct on average with an uncer- 
tainty of w 15%. However, for the perturbed CB244 data 
the best-fit values of S were on average 1.7tr away from 
unity. The calibration errors derived from the perturbed 
data set are inconsistent with our assumption that they 
on average should equal unity with a dispersion of 15%, 
correctly suggesting problems with the perturbed data 
set. The likely reason for this is that the error in the 
zero-level is partially being absorbed by the estimated 
values of the calibration errors, 6. 

As a final test, we fix the calibration errors equal to 
unity and refit the data using our Hierarchial Bayesian 
method. This is equivalent to assuming that the cali- 
bration is correct and that there is no uncertainty on 
it. This allows us to test the impact of overestimating 
the calibration uncertainty, as we have conservatively as- 
sumed calibration uncertainties that are larger than the 
official values recommended by the Herschel Science Cen- 
ter. In addition, it allows us a more direct comparison 
with the x 2 results, as the x 2 nTiS ignore the impact of 
calibration errors. Fixing Sj = 1 only resulted in small 
differences in the estimated mean values and correlations 
of (log Njj, T, f3). The derived value of Spearman's rank 
correlation for the j3-T relationship was p — 0.28 ± 0.02 
when we ignored the calibration errors, compared to 
p = 0.33 ± 0.04 obtained when the calibration errors 
are included. The anti-correlation between Njj and /? is 
actually stronger when we ignore the calibration errors, 
having a value of Corr (log iVff,/3) = -0.944± 0.004, com- 
pared to Corr(logiVtf, /3) = -0.786±0.040 obtained when 
we include the calibration uncertainties. 

When we include the calibration errors the derived 
mean values are (logN H ) = 20.90 ± 0.12, (/3) = 1.92 ± 
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0.19, and (T) = 14.85 ± 0.42. However, when we ig- 
nore the calibration errors we find (log Njj) = 21.091 ± 
0.006, (j3) = 1.797 ±0.003, and (T) = 14.153 ± 0.016. 
Using the median of the x 2 -based estimates we find 
(logoff) = 21.126 ±0.005, (0) = 1.787 ±0.002, and 
(T) = 14.265 ± 0.0179. Ignoring the calibration errors 
produces mean values of the SED parameters and their 
uncertainties that are similar to those obtained from the 
X 2 -based estimates. In addition, the uncertainties in the 
means and correlations of the SED parameters are larger 
when we include the calibration errors, as the calibration 
uncertainty is reflected in the much larger uncertainties 
in the SED parameters. 

Changing either the DC level or calibration error re- 
sults in a constant additive or multiplicative offset for 
each map. While these errors can alter the mean values of 
the SED parameters, they do not alter their correlations 
as the correlations are driven by the spatial variations 
of flux values accross the maps, and not by the mean 
flux value in each map. More complicated spatially- 
correlated data systematics may produce biases in the 
inferred correlations among SED parameters. Simula- 
tions should be used to assess the impact on the scien- 
tific results when strong spatially correlated errors are 
thought to be present in a data set. 

5. DISCUSSION 

The application of our hierarchical Bayesian method to 
observed fluxes of CB244 reveals a number of interesting 
features. First, we find that there is only a limited range 
in fi and T, with f3 € (1.8, 2.6), and T G (11, 16) K. 
Second, /3 and T are positively correlated, albeit weakly 
so, suggesting that the strong anti-correlation seen in 
previous work is driven by noise. Further, the Bayesian 
fit suggests that f3 declines towards the central region of 
the starless core, where the temperature decreases and 
the column density increases (Fig. [5]). In fact, the pa- 
rameterization of j3 in terms of N and T in Equation [T5] 
indicates that /3 is more strongly correlated with N than 
onT. 

While we have found a number of interesting features 
in our analysis of CB244, their interpretation is more 
difficult. Strictly speaking, our derived trends are with 
respect to the isothermal greybody SED parameters, 
which are not necessarily equivalent to the correspond- 
ing physical parameters that they are intended to es- 
timate. Thus, it is unclear if our derived correlations 
represent astrophysically-meaningful correlations, or are 
instead driven by systematics involving the data reduc- 
tion, background subtraction, and SED model. Such 
trends may be driven by variations in temperature and 
density along the line-of-sight, which are currently not 
accounted for in our analysis. For example, the appar- 
ent correlation between /3 and Nh cannot be the actual 
physical correlation as there is both low and high-density 
gas along the line of sight. The physical cause could be a 
correlation between volume density and /3. In this sense 
it is also possible that such trends are at least in part 
driven by real astrophysical variations, possibly due to 
the growth of dust grains. In high density compressed 
regions of the ISM, grain sizes may increase due to dust 
coagulation, possibly leading to an increase in f3. Com- 
pared to the ISM values of /3 ~ 2, lower (3^,1 are 
found in numerous studies of protoplanetary disks (e.g. 
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Draind 120061 : iRicci et al.M20111 ). The interpretation is 
that grains in disks are much larger than in the more 
diffuse ISM, and that these grains are the seeds of proto- 
planets. The difference in grain sizes between the large 
scale ISM and protoplanetary disks suggests that during 
some epoch of the star formation process, grains begin 
to grow. While this is an intriguing interpretation of our 
results we stress that we are currently not in a position 
to sort out the contributions to our inferred correlation 
from the systematics and real astrophysical variations; 
future work will address systematic errors resulting from 
our assumptions of optically-thin isothermal dust. 

In order to accurately map the T, /3, or density 
structure of an observed region, line-of-sig ht variations 
must b e taken into account. For example, iShettv et al.l 
(2009b) find that when the model j3 is constant but 
there are temperature variations along the line-of-sight, 
the assumption of isothcrmality produces /3 estimates 
which are inversely correlated to the fitted temperatures. 
Line-of-sight T variations will effect both the hierarchi- 
cal Bayesian and \ 2 fits of Equation (pj to the observed 
fluxes in the same manner. This is because the Bayesian 
and x 2 estimates become equivalent in the limit of infi- 
nite S/N. Because our hierarchical Bayesian model ac- 
counts for the statistical errors, the results obtained from 
it should be interpreted as an estimate of what would 
have been obtained if there is no measurement error. 
It may be that the relationships that we find between 
/3, T, and N(H) are driven at least in part by line-of-sight 
variations, making their astrophysical interpretation dif- 
ficult. 

In addition to biases due to line-of-sight variations, 
our hierarchical Bayesian results may be biased by the 
optically-thin approximation to the SED. We can esti- 
mate the magnitude of this bias using the results from 
the cross-validation test, described in § 14.21 Because the 
100 /jm map should be the most affected by the optically- 
thin approximation, if the optically-thin approximation 
is not valid we might expect the error in the 100 /im 
data that was omitted from the fit to be systematically 
under- or overestimated. We did not notice any signifi- 
cant offset in the cross-validation error for either the \ 2 
or hierarchical Bayesian estimates. Moreover, under our 
assumption of ko = 0.009 cm 2 /g with Vq = 230 GHz, 
we estimate the optical depth at 100 jjxa in the core to 
be t s=s 0.05. Therefore we do not find any significant 
evidence that the optically-thin approximation is having 
a strong affect on our results. 

We can be confident that statistical uncertainties 
which lead to spurious and pronounced T — /3 anticor- 
relations are appropriately handled in the hierarchical 
Bayesian method, and thus our inferred correlations are 
statistically significant. However, systematic errors due 
to mispecification of the SED model, such as line-of- 
sight variations, also affect our hierarchical Bayesian re- 
sults. There may also be difficulties with the data reduc- 
tion that can introduce spatially-correlated systematic 
errors, such as unidentified background emission from as- 
trophysical sources, which in turn can bias the inferred 
correlations. Therefore at this time we cannot disenta- 
gle systematic effects from real physical effects in our 
inferred correlations. Nevertheless, because the hierar- 
chial Bayesian fits rigorously and correctly account for 
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the statistical errors, we are now in a position to isolate 
the effects of systematic errors on the scientific conclu- 
sions. A thorough analysis of possible approaches to ac- 
count for line-of-sight variations which does not rely on 
the optically-thin approximation will be investigated in 
a future publication. We will also apply our method to a 
large sample of starless cores in order to investigate if the 
trends derived here for CB244 extend to a larger sample, 
providing a more thorough treatment of data systemat- 
ics. 



6. SUMMARY 

We have developed a hierarchical Bayesian method 
that rigorously treats measurement errors for fitting 
single-temperature greybody SEDs. The Bayesian 
method provides a probability distribution for the values 
of temperature T, spectral index j3, and column density 
N in each pixel (or source) , conditional on the measured 
data, as well as for the distribution of these parameters 
over an entire map (for a resolved source) or survey (for 
multiple unresolved sources). In testing the hierarchical 
Bayesian method on model sources, we demonstrate that 
it can accurately recover the true parameters and corre- 
lations, whereas the \ 2 fit produces an artificial T — (3 
anti-correlation due to the degeneracy between T and j3. 

We have applied our hierarchical Bayesian model to 
Herschel and ground-based observations of the Bok glob- 
ule CB244. The Bayesian fit estimates j3 e (1.8, 2.6), 
T € (11, 16) K, which is significantly more constrained 
than the \ 2 estimates. Further, we find that /? and T 
are weakly positively correlated, in direct opposition to 
the \ 2 results. We have mapped out the spatial distri- 
bution of T, 0, and Nu, and the correlations between 
these properties. We find that j3 decreases from ~ 2.6 
where Nh ~3x 10 19 cm~ 2 , to ~ 1.8 in the densest re- 
gion of the starless core, where Nh > 10 22 cm~ 2 . While 



these results may be at least partially driven by system- 
atics regarding the data reduction and the modeling, our 
method properly corrects for the statistical uncertainties, 
illustrating that the x 2 results are significantly affected 
by noise. 

Due to the accuracy of the hierarchical Bayesian 
method, and its estimate of a positive correlation be- 
tween T and /3, it may be used to assess any T — (3 
anti-correlation found from % 2 fits. Our analysis demon- 
strates that hierarchical Bayesian methods can accu- 
rately estimate the dependence between SED parame- 
ters, and therefore may be used to further understand 
grain evolution in the ISM. 
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